Evaluation of genomic offset predictions in common gardens
Author
Juliette Archambeau
Published
February 12, 2024
1 Introduction
In this report, we evaluate whether genomic offset (GO) predictions from the different GEA methods (LFMM, GF, GDM and RDA) and SNP sets (control, candidate and all SNPs for LFMM) are good predictors of fitness proxies (height and mortality data) from five clonal common gardens located in Spain (Asturias, Madrid, Cáceres), Portugal (Fundão) and France (Pierroton). For that, we predicted the genomic offset of each population when transplanted in the environment of the common garden (so instead of predicting the genomic offset into future climates, we predict it in the new environment of the common garden, i.e. space-for-time approach). The rational is that populations with higher genomic offset in a given common garden are expected to have lower relative fitness in the common garden.
We also calculate the climatic transfer distances (CTD) between the climate-of-origin of the populations and the climate of the common garden (i.e. the absolute value of the difference between the climate of origin of the populations and the climate in the common garden between the tree planting date and the measurement date). We then estimate the association between tree height and mortality and the climatic transfer distances. The climatic transfer distances are calculated for the same set of climatic variables as the one used to calculate the genomic offset (one CTD per climatic variable).
2 Data
2.1 Genomic offset predictions
We load the genomic offset predictions estimated from the different GEAs.
We load the survival and mortality data from the five common gardens.
Code
pheno_data <-readRDS(file=here("data/CommonGardenData/PhenoDataNovember2019_AnnualTraits_UpdatedSept2021_AllSites.rds")) %>% dplyr::rename(pop=prov)no_nas <-sapply(pheno_data, function(x) length(x)-sum(is.na(x)))list_pheno <-list()list_pheno$`Asturias, Spain (37 months)`<-table(pheno_data$site,pheno_data$AST_survmar14)["asturias",]list_pheno$`Bordeaux, France (85 months)`<-table(pheno_data$site,pheno_data$BDX_surv18)["bordeaux",]list_pheno$`Cáceres, Spain (8 months)`<-table(pheno_data$site,pheno_data$CAC_survdec11)["caceres",]list_pheno$`Madrid, Spain (13 months)`<-table(pheno_data$site,pheno_data$MAD_survdec11)["madrid",]list_pheno$`Fundão, Portugal (27 months)`<-table(pheno_data$site,pheno_data$POR_survmay13)["portugal",]df_exp_design <- list_pheno %>%bind_rows(.id="cg") %>%setNames(c("Common garden (tree age)","Number of dead trees","Number of trees alive")) %>%mutate("Number of height measurements"=c(no_nas[["AST_htmar14"]], no_nas[["BDX_htnov18"]], no_nas[["CAC_htdec11"]], no_nas[["MAD_htdec11"]], no_nas[["POR_htmay13"]]))saveRDS(df_exp_design, file =here("tables/ExpDesign_CG.rds"))df_exp_design %>% kable_mydf
Common garden (tree age)
Number of dead trees
Number of trees alive
Number of height measurements
Asturias, Spain (37 months)
206
3976
3976
Bordeaux, France (85 months)
119
3225
3217
Cáceres, Spain (8 months)
3818
366
340
Madrid, Spain (13 months)
3138
1046
1046
Fundão, Portugal (27 months)
1517
2666
2665
Height and survival measurements in each common garden:
Asturias (Spain) in March 2014 when the trees were 37 month-old (trees were planted in February 2011).
Bordeaux (France) in November 2018 when the trees were 85 month-old (trees were planted in October 2011).
Cáceres (Spain) in December 2011 when the trees were 8 month-old (trees were planted in April 2011). Note that for this common garden, we calculate the bioclimatic variables for the entire year 2011 (instead of calculating the variables only for months between April and December). Indeed, the calculation of the annual bioclimatic variables will be wrong if we do not account for some months, e.g. the mean annual temperature will be higher than expected because we do not account for some winter months.
Madrid (Spain) in December 2011 when the trees were 13 month-old (trees were planted in November 2010).
Fundão (Portugal) in May 2013 when the trees were 27 month-old (trees were planted in February 2011)
3 Mortality models
In this section, we want to determine whether genomic offset (GO) or climate transfer distances (CTD) are associated with the proportion of dead trees in the populations, independently in the five common gardens. We build a model that assumes that the initial sapling height acts as a confounder. Indeed, trees that were higher at the time of planting have a higher probability of survival. This is particularly true in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. Here is the model:
with \(a_{p}\) the count of individuals that died in the population \(p\), \(N_{p}\) the total number of individuals in the population \(p\) (=number of individuals that were initially planted in the common garden), \(p_p\) is the estimated probability of mortality in the population \(p\), \(X_{p}\) is the genomic offset or climatic transfer distance for the population \(p\) and \(H_{p}\) is the BLUPs for height of the population \(p\)(population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022). \(H_{p}\) is used as a proxy of the initial tree height.
We load the population-specific intercepts from the model 1 of Archambeau et al. (2022).
Code
mod_arch2022 <-readRDS(file=here("data/Archambeauetal2022_MOD1.rds"))pop_heights <- mod_arch2022$fit %>% broom.mixed::tidyMCMC(estimate.method ="mean",conf.int = T) %>%# we take the mean of the prov random interceptsfilter(str_detect(term, "^(r_prov\\[)")) %>% dplyr::rename(height=estimate,pop=term) %>%mutate(pop=str_sub(pop,8,-12))pop_heights[1:5,] %>% kable_mydf
pop
height
std.error
conf.low
conf.high
ALT
0.09
0.04
0.01
0.17
ARM
0.12
0.04
0.04
0.20
ARN
-0.09
0.03
-0.16
-0.03
BAY
-0.14
0.03
-0.20
-0.07
BON
-0.04
0.04
-0.12
0.04
We export the height intercepts from Archambeau et al. (2022) to the DRYAD repository.
The response variable is counts of dead trees. To calculate these counts, we load the survival data in the common gardens, in which 0 corresponds to dead trees and 1 to survivors.
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
int nb_tot[N]; // Total number of trees in the population
int nb_dead[N]; // Number of dead trees in the population
vector[N] H; // Mean tree height of the population
vector[N] X; // Genomic offset or climatic transfer distance of the population
}
parameters {
real beta_0;
real beta_H;
real beta_X;
}
model {
nb_dead ~ binomial_logit(nb_tot,beta_0 + beta_H * H + beta_X * X);
beta_0 ~ normal(0,5);//std_normal();
beta_H ~ normal(0,5);//std_normal();
beta_X ~ normal(0,5);//std_normal();
}
// generated quantities{
// vector[N] log_lik;
// // log likelihood for loo
// for (n in 1:N) {
// log_lik[n] = binomial_logit_lpmf( nb_dead[n] | nb_tot[n] , beta_0 + beta_H * H[n] + beta_X * X[n]);
// }
// }
We run the models for each common garden and each genomic offset predictions/climatic transfer distances, so a total of 5 * 15 = 75 models runs.
Code
coefftab <-lapply(unique(survival_data$site),function(site_i){lapply(unique(df$method), function(method_i){# Subset the data for the site i and method isub_data <- survival_data %>%filter(site == site_i) %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) # transform survival data into mortality datasub_data <- df %>%filter(method == method_i & cg == site_i) %>%inner_join(sub_data, by="pop") %>%inner_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))), by="pop") %>%arrange(pop)# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot=sub_data$nb_tot,H=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX))# Running the modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE) # Save the model and the stanlistlist(mod = mod, stanlist = stanlist) %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_i,".rds")))# Save coefficientsbroom.mixed::tidyMCMC(mod,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>%filter(str_detect(term, c('beta'))) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds"))
Below, we plot the mean and 95% credible intervals of:
the \(\beta_X\) coefficients, which stand for the association between the counts of dead trees and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic transfer distances.
the \(\beta_H\) coefficients, which stand for the association between the counts of dead trees and the BLUPs for height in the five common gardens, used as a proxy of the initial seedling height (a confounder in the model).
Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting higher mortality rates in populations with higher GO predictions.
Code
coeff_match <-list(beta_H="Regression coefficients $\\beta_H$",beta_X="Regression coefficients $\\beta_X$")p <-lapply(c("beta_X","beta_H"), function(coeff){p <- coefftab %>%filter(term==coeff) %>%left_join(distinct(df[,c("method_type","method_input","cg","method")]), by=c("method","cg")) %>%mutate(cg=case_when(cg=="asturias"~"Asturias, Spain (37 months)", cg=="bordeaux"~"Bordeaux, France (85 months)", cg=="caceres"~"Cáceres, Spain (8 months)", cg=="madrid"~"Madrid, Spain (13 months)", cg=="portugal"~"Fundão, Portugal (27 months)"),method_input=case_when(method_input=="all"~"All SNPs", method_input=="bio1"~"Mean annual temperature (°C)", method_input=="bio12"~"Annual precipitation (mm)", method_input=="bio15"~"Precipitation seasonality (index)", method_input=="bio3"~"Isothermality (index)", method_input=="bio4"~"Temperature seasonality (°C)", method_input=="cand"~"Candidate SNPs", method_input=="control"~"Control SNPs", method_input=="SHM"~"Summer heat moisture index (°C/mm)")) %>%mutate(method_input=factor(method_input, levels=c("All SNPs","Candidate SNPs","Control SNPs","Mean annual temperature (°C)","Annual precipitation (mm)","Precipitation seasonality (index)","Isothermality (index)","Temperature seasonality (°C)","Summer heat moisture index (°C/mm)")),method_type=factor(method_type, levels=c("GDM","GF","LFMM","RDA","CTD")),cg=factor(cg, levels=c("Asturias, Spain (37 months)","Madrid, Spain (13 months)","Bordeaux, France (85 months)","Cáceres, Spain (8 months)","Fundão, Portugal (27 months)")))# Plots with CTD and SNP sets# ===========================p_allvar <- p %>%ggplot(aes(x = method_type,y = mean,ymin = conf_low, ymax = conf_high,color=method_input,shape=method_input)) + {if(coeff=="beta_X") geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.01)} +geom_hline(yintercept =0,color="gray") +geom_pointinterval(position =position_dodge(width =0.6),point_size=3.5, size=3) +facet_wrap(~cg, ncol=2) +ylab(TeX(coeff_match[[coeff]])) +xlab("") +scale_color_manual(values=colors_coeff)+scale_shape_manual(values =c(rep(17,3),rep(16,6))) +theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),axis.title.x =element_text(size=1),legend.title=element_text(size=13), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=12),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color=guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))# save in pdf and pngggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.pdf")),device="pdf",height=9,width=12)ggsave(p_allvar,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsandCTD.png")),height=9,width=12)# Plots with SNPs only# ====================p_SNPs <- p %>%filter(!method_type =="CTD") %>%ggplot(aes(x = method_type,y = mean,ymin = conf_low, ymax = conf_high,color=method_input,shape=method_input)) + {if(coeff=="beta_X") geom_rect(inherit.aes=FALSE, aes(xmin=-Inf, xmax=Inf, ymin=0, ymax=Inf), color="transparent", fill="green", alpha=0.01)} +geom_hline(yintercept =0,color="gray") +geom_pointinterval(position =position_dodge(width =0.45),point_size=3.5, size=3) +# facet_wrap(~cg, ncol=2) +ylab(TeX(coeff_match[[coeff]])) +xlab("") +scale_color_manual(values=colors_coeff) +scale_shape_manual(values =c(rep(17,3),rep(16,6))) +theme_bw() +labs(color="",shape="") +theme(axis.text.x =element_text(size=13),axis.text.y =element_text(size=13),axis.title.y =element_text(size=16),axis.title.x =element_text(size=1),legend.title=element_text(size=13), strip.text.x =element_text(size =16),strip.background =element_blank(),legend.position =c(0.77,0.15),legend.text=element_text(size=18),panel.grid.minor.x=element_blank(),panel.grid.major.x=element_blank()) +guides(color=guide_legend(ncol=1),shape =guide_legend(override.aes =list(size =2 )))# save in pdf and pngggsave(p_SNPs,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets.pdf")),device="pdf",height=7,width=8)ggsave(p_SNPs,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets.png")),height=7,width=8)# Figure in the main manuscript################################ We want to add some images to represent the climatic differences among common gardensif(coeff=="beta_X"){annotation_custom2 <-function (grob, xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =Inf, data){ layer(data = data, stat = StatIdentity, position = PositionIdentity, geom = ggplot2:::GeomCustomAnn,inherit.aes =TRUE, params =list(grob = grob, xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax))}df_coord_png <- p %>% dplyr::select(cg) %>% distinct %>% dplyr::mutate(x_min =case_when(cg =="Asturias, Spain (37 months)"~4.02, cg =="Bordeaux, France (85 months)"~4.02, cg =="Cáceres, Spain (8 months)"~4.25, cg =="Madrid, Spain (13 months)"~4.23, cg =="Fundão, Portugal (27 months)"~4.09),y_max =case_when(cg =="Asturias, Spain (37 months)"~-0.28, cg =="Bordeaux, France (85 months)"~-0.28, cg =="Cáceres, Spain (8 months)"~-0.45, cg =="Madrid, Spain (13 months)"~-0.43, cg =="Fundão, Portugal (27 months)"~-0.38))list_pngs <-lapply(unique(p$cg), function(cg_i){sub <- p %>%filter(cg==cg_i) %>%filter(!method_type =="CTD") %>%slice(1)png_cg =annotation_custom2(rasterGrob(cloudpng,interpolate=TRUE), ymin =-0.56,ymax= df_coord_png[df_coord_png$cg==cg_i,"y_max"][[1]],xmin =df_coord_png[df_coord_png$cg==cg_i,"x_min"][[1]],xmax =4.55, data=sub)})p_SNPs <- p_SNPs + list_pngs[[1]]+ list_pngs[[2]]+ list_pngs[[3]]+ list_pngs[[4]]+ list_pngs[[5]]# save in pdf and pngggsave(p_SNPs,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/",coeff,"_SNPsets_WithCloudImage.pdf")),device="pdf",height=7,width=8)}return(p_allvar)})p
[[1]]
[[2]]
Interpretation
As expected, the association between mortality rates and the initial tree height was particularly strong in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. More surprisingly, this association was also osberved in Fundão, Portugal. However, the initial tree height was not associated with mortality rates in Bordeaux (France) and Asturias (Spain), which benefit from the favorable climates of the Atlantic region and in which the mortality rates were very low.
In Fundão, Cáceres and Madrid, for most genomic offset predictions, populations experiencing higher mortality rates also showed higher genomic offset. The most consistent associations across the three common gardens were obtained with:
Gradient Forest (GF) with both candidate and control SNPs.
Redundancy analysis (RDA) with both candidate and control SNPs.
Latent factor mixed model (LFMM) with all SNPs or control SNPs.
Interestingly, climatic transfer distances generally explain mortality probability less well than genomic offset predictions, and none of them show a consistent association with the number of dead trees across the three common gardens. Note that it is almost the case for the climatic transfer distance calculated based on the temperature seasonality, which shows a positive association with the counts of dead trees in Madrid and Fundão, and almost in Cáceres.
In Asturias, no association was detected between genomic offset predictions and mortality rates, which may be due to the favorable climatic conditions of this common garden and the associated low mortality rates (only 206 dead trees). Therefore, in this common garden, mortality events are likely to be mostly random events due to other factors than climate. However, we can note that an association was detected between mortality rates and isothermality: populations from areas with high isothermality tend to die more in the common garden of Asturias, Spain.
In Bordeaux, there were even less dead trees than in Asturias (119 dead trees). However, the two nonparametric approach used to predict the genomic offset (GDM and GF) detected a positive association between mortality rates and the genomic offset predictions obtained with both the candidate and the control SNPs. A positive association was also detected with the genomic offset predictions obtained with candidate SNPs with the LFMM approach. On the other hand, none of the genomic offset predictions from the RDA approach were associated with the mortality rates. Interestingly, the only association with a climatic transfer distance was with the temperature seasonality (bio4), which was the most important variable to explain the turnover in allele frequency in the GF and GDM approaches.
3.4 Predictions of tree mortality probability
3.4.1 Plots
Let’s visualize the uncertainty around the estimation of the probability of tree mortality \(p\).
Code
##################################################################################################### Visualizing the relationships between GO predictions or CTDs and mortality probability predictions####################################################################################################lapply(names(cg_names), function(site_i){list(CTDs=names(methods) %>%str_subset("^CTD"),go_methods=names(methods) %>%str_subset("^CTD", negate=T)) %>%lapply(function(method_group_i){myplots <-lapply(method_group_i, function(method_i){# We load the stanliststanlist <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_i,".rds")))$stanlist# We load the modelmod <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_i,".rds")))$mod# We extract the posterior distributions of all parameterspost <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions)x_vec <-seq(min(stanlist$X),max(stanlist$X),0.05)# Function to predict the mortality probability p with the initial tree height fixed to the meanf_p <-function(x) 1/ (exp(-(post$`beta_0`+ post$`beta_H`*mean(stanlist$H) + post$`beta_X`* x)) +1)p_pred <-sapply(x_vec, f_p) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, p, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(p, credMass =0.90)[1],hpdi_h = HDInterval::hdi(p, credMass =0.90)[2],p_mean =mean(p)) %>%ungroup() %>%mutate(x =as.numeric(x))# Keep mean and 90% HDPI of p (one value for each iteration)p_mean_df <- p_pred %>% dplyr::select(x,hpdi_l,hpdi_h,p_mean) %>% dplyr::distinct()# Plots#=======# Plot options y_limits <-c(0,1)if(grepl("CTD",method_group_i[[1]]) == T){x_axis <-"Mean-centered CTD"} else { x_axis <-"Mean-centered GO predictions"}y_axis <-"Probability of tree mortality"p <-ggplot() +ylim(y_limits) +labs(y=y_axis, x=x_axis) +theme_bw()# First 100 posterior draws of the mortality probability pp1 <- p +geom_point(data = p_pred %>%filter(Iter <101),aes(x, p), alpha = .2, color ='dodgerblue')# Mean (line) and 90% HPDI (shaded region) of the mortality probability pp2 <- p +geom_ribbon(data = p_mean_df,aes(x = x, ymin = hpdi_l, ymax = hpdi_h),alpha = .2,fill='dodgerblue') +geom_line(data = p_mean_df,aes(x=x, y=p_mean), col="dodgerblue4")p1_p2 <-ggarrange(p1, p2 +labs(y=""), nrow =1)# Titletitle <-ggdraw() +draw_label(paste0(cg_names[[site_i]]," - ",methods[[method_i]]),fontface ='bold',x =0, hjust =0) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plotsp1_p2 <-plot_grid(title, p1_p2, ncol =1, rel_heights =c(0.1, 1))p2 <- p2 +labs(subtitle = methods[[method_i]])ggsave(p1_p2,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/MortalityProbabilityPredictions_UncertaintyIntervals/", method_i,"_",site_i,".pdf")),device="pdf",height=6,width=11)return(p2)})if(grepl("CTD",method_group_i[[1]]) == T){ group_name <-"CTDs" p <-plot_grid(myplots[[1]] +labs(x=""), myplots[[2]] +labs(y="",x=""), myplots[[3]] +labs(x=""), myplots[[4]] +labs(y="",x=""), myplots[[5]], myplots[[6]] +labs(y=""),ncol=2) fig_width <-10} else { group_name <-"GOpred" p <-plot_grid(myplots[[1]] +labs(x=""), myplots[[2]] +labs(y="",x=""), myplots[[3]] +labs(y="",x=""), myplots[[4]] +labs(x=""), myplots[[5]] +labs(y="",x=""), myplots[[6]] +labs(y="",x=""), myplots[[7]], myplots[[8]] +labs(y=""), myplots[[9]] +labs(y=""),ncol=3) fig_width <-12 }ggsave(p,file=here(paste0("figs/ValidationCommonGarden/MortalityModels/MortalityProbabilityPredictions_",group_name,"_",site_i,".pdf")),device="pdf",height=12,width=fig_width)})})
3.4.2 Table
In the table below:
mean and 90% credible intervals (highest posterior density intervals) of \(p\) for \(x=\{-1, 0, 1\}\).
percent change in \(p\) associated with a one-unit increase in \(x\) (which corresponds to a one-standard deviation increase).
Code
p_pred <-lapply(names(cg_names), function(site_i){lapply(names(methods), function(method_i){# We load the stanliststanlist <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_i,".rds")))$stanlist# We load the modelmod <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/MortalityModels/stan_models/", site_i,"_",method_i,".rds")))$mod# We extract the posterior distributions of all parameterspost <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions)x_vec <-c(-1,0,1)# Function to predict the mortality probability p with the initial tree height fixed to the meanf_p <-function(x) 1/ (exp(-(post$`beta_0`+ post$`beta_H`*mean(stanlist$H) + post$`beta_X`* x)) +1)p_pred <-sapply(x_vec, f_p) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, p, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(p, credMass =0.90)[1], # Highest posterior density intervalshpdi_h = HDInterval::hdi(p, credMass =0.90)[2]# pi_l = rethinking::PI(p, prob = 0.90)[1], # Highest posterior percentile intervals# pi_h = rethinking::PI(p, prob = 0.90)[2] ) %>%mutate(p_mean =mean(p)) %>%ungroup() %>%mutate(x =as.numeric(x)) %>% dplyr::select(x,p_mean,hpdi_l,hpdi_h) %>%distinct() %>%round(3)delta_p_pos <- (p_pred$p_mean[p_pred$x==1] *100/ p_pred$p_mean[p_pred$x==0])-100delta_p_neg <- (p_pred$p_mean[p_pred$x==-1] *100/ p_pred$p_mean[p_pred$x==0])-100tibble(Site = cg_names[[site_i]],"Method"= methods[[method_i]],"p(x=-1)"=paste0(p_pred$p_mean[p_pred$x==-1], " [", p_pred$hpdi_l[p_pred$x==-1],";", p_pred$hpdi_h[p_pred$x==-1],"]"),"p(x=0)"=paste0(p_pred$p_mean[p_pred$x==0], " [", p_pred$hpdi_l[p_pred$x==0],";", p_pred$hpdi_h[p_pred$x==0],"]"),"p(x=1)"=paste0(p_pred$p_mean[p_pred$x==1], " [", p_pred$hpdi_l[p_pred$x==1],";", p_pred$hpdi_h[p_pred$x==1],"]"),"+Δp"=paste0(round(delta_p_pos,3),"%"),"-Δp"=paste0(round(delta_p_neg,3),"%"))}) %>%bind_rows()}) %>%bind_rows()p_pred %>%saveRDS(file =here("tables/MortalityPredCGs.rds"))
We export in a table with the number and proportion of dead trees for each population in each common garden.
Code
ExpDesignTab <-lapply(unique(survival_data$site),function(site_i){survival_data %>%filter(site==site_i) %>% dplyr::select(pop,survival) %>%drop_na() %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) %>%mutate(prop_dead=nb_dead*100/nb_tot)}) %>%setNames(unique(survival_data$site)) %>%bind_rows(.id="cg") %>%pivot_wider(names_from=cg,values_from =c(nb_dead, nb_tot,prop_dead),names_sep="_") %>% dplyr::select(pop,contains("asturias"),contains("bordeaux"),contains("caceres"),contains("madrid"),contains("portugal"))# To generate a latex table# =========================# print(xtable(ExpDesignTab, type = "latex",digits=2), # file = paste0(here("tables/ExpDesignMortalityModelsPerPopCG.tex")), # include.rownames=FALSE)# Export the table for the Supplemetary Information# =================================================ExpDesignTab %>% dplyr::select(-contains("prop_dead")) %>%saveRDS(here("tables/ExpDesignMortalityModelsPerPopCG.rds"))# Show table# ==========ExpDesignTab %>%kable_mydf()
pop
nb_dead_asturias
nb_tot_asturias
prop_dead_asturias
nb_dead_bordeaux
nb_tot_bordeaux
prop_dead_bordeaux
nb_dead_caceres
nb_tot_caceres
prop_dead_caceres
nb_dead_madrid
nb_tot_madrid
prop_dead_madrid
nb_dead_portugal
nb_tot_portugal
prop_dead_portugal
ALT
3
72
4.17
1
53
1.89
69
72
95.83
52
72
72.22
19
72
26.39
ARM
1
64
1.56
1
64
1.56
56
64
87.50
49
64
76.56
23
64
35.94
ARN
5
136
3.68
3
109
2.75
129
136
94.85
107
136
78.68
46
136
33.82
BAY
8
144
5.56
3
142
2.11
132
144
91.67
113
144
78.47
56
144
38.89
BON
2
71
2.82
1
48
2.08
62
72
86.11
48
72
66.67
14
72
19.44
CAD
5
80
6.25
3
70
4.29
74
80
92.50
62
80
77.50
22
80
27.50
CAR
0
48
0.00
0
46
0.00
44
48
91.67
36
48
75.00
21
48
43.75
CAS
4
80
5.00
6
78
7.69
78
80
97.50
66
80
82.50
31
80
38.75
CEN
2
72
2.78
1
38
2.63
66
72
91.67
50
72
69.44
18
72
25.00
COC
6
144
4.17
3
109
2.75
137
144
95.14
118
144
81.94
53
143
37.06
COM
0
32
0.00
3
32
9.38
28
32
87.50
21
32
65.62
10
32
31.25
CUE
9
224
4.02
9
194
4.64
211
224
94.20
186
224
83.04
101
224
45.09
HOU
11
208
5.29
6
160
3.75
186
208
89.42
145
208
69.71
66
208
31.73
LAM
3
72
4.17
4
60
6.67
68
72
94.44
57
72
79.17
27
72
37.50
LEI
14
184
7.61
10
143
6.99
162
184
88.04
147
184
79.89
64
184
34.78
MAD
0
8
0.00
0
8
0.00
8
8
100.00
5
8
62.50
2
8
25.00
MIM
7
144
4.86
4
126
3.17
135
144
93.75
116
144
80.56
67
144
46.53
OLB
3
176
1.70
4
116
3.45
149
176
84.66
113
176
64.20
27
176
15.34
OLO
22
191
11.52
3
132
2.27
173
192
90.10
145
192
75.52
73
192
38.02
ORI
4
208
1.92
2
183
1.09
196
208
94.23
162
208
77.88
67
208
32.21
PET
11
192
5.73
3
147
2.04
171
192
89.06
136
192
70.83
75
192
39.06
PIA
8
128
6.25
1
110
0.91
111
128
86.72
91
128
71.09
43
128
33.59
PIE
3
72
4.17
4
55
7.27
67
72
93.06
50
72
69.44
17
72
23.61
PLE
8
160
5.00
9
138
6.52
155
160
96.88
131
160
81.88
71
160
44.38
PUE
1
64
1.56
5
56
8.93
58
64
90.62
50
64
78.12
30
64
46.88
QUA
5
136
3.68
3
117
2.56
123
136
90.44
83
136
61.03
36
136
26.47
SAC
1
72
1.39
2
51
3.92
65
72
90.28
56
72
77.78
35
72
48.61
SAL
7
112
6.25
0
69
0.00
104
112
92.86
85
112
75.89
55
112
49.11
SEG
12
168
7.14
7
168
4.17
154
168
91.67
135
168
80.36
77
168
45.83
SIE
2
64
3.12
1
47
2.13
59
64
92.19
40
64
62.50
29
64
45.31
STJ
15
224
6.70
3
180
1.67
190
224
84.82
154
224
68.75
70
224
31.25
TAM
8
120
6.67
4
71
5.63
114
120
95.00
107
120
89.17
60
120
50.00
VAL
5
96
5.21
5
64
7.81
87
96
90.62
69
96
71.88
29
96
30.21
VER
11
216
5.09
5
160
3.12
197
216
91.20
153
216
70.83
83
216
38.43
Code
# Information used in the manuscriptExpDesignTab[,-1] %>% dplyr::summarise_all(mean) %>% kable_mydf
nb_dead_asturias
nb_tot_asturias
prop_dead_asturias
nb_dead_bordeaux
nb_tot_bordeaux
prop_dead_bordeaux
nb_dead_caceres
nb_tot_caceres
prop_dead_caceres
nb_dead_madrid
nb_tot_madrid
prop_dead_madrid
nb_dead_portugal
nb_tot_portugal
prop_dead_portugal
6.06
123
4.27
3.5
98.35
3.7
112.29
123.06
91.65
92.29
123.06
74.31
44.62
123.03
35.79
4 Height models
In this section, we estimate the association between genomic offset (GO) predictions or climate transfer distances (CTD) and tree height, independently in the five common gardens. We compare four different mathematical models.
We first subset height measurements from the dataset (below the first five rows of the dataset are shown).
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N; // Number of individuals
vector[N] Y; // Response variable (individual tree height)
vector[N] X; // Genomic offset or climatic transfer distances
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X);
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
coefftab <-lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method), function(method_i){ df_sub <- df %>%filter(method == method_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block))) mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars=params_to_estimate) # Model coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M1.rds"))
4.1.2 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\) and \(\beta_{X_2}\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M1")
[[1]]
[[2]]
[[3]]
4.1.3 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i="M1")
$asturias
$bordeaux
$caceres
$madrid
$portugal
4.1.4 Interpretation
Interpretation
Higher genomic offset predictions are consistently associated with lower tree height only in Asturias. Most genomic offset predictions are also associated with lower tree height in Bordeaux.
In Fundão, Cáceres and Madrid, the association between genomic offset predictions and tree height are often in the opposite direction as expected.
Interestingly, the climatic transfer distances based in the mean annual temperature and the annual precipitation are negatively associated with tree height in all common gardens (except Cáceres for the CTD based on annual precipitation). These CTD may therefore be better predictors of tree height in common gardens than the genomic offset predictions.
Finally, I think that using tree height as a proxy for fitness to evaluate genomic offset predictions in common gardens may not be appropriate in maritime pine because this trait has a very low genetic-by-environment interaction (see previous papers). Therefore, the association between genomic offset predictions and tree height in Asturias and Bordeaux may not be due to a higher fitness of the trees that are best adapted to the climate in these common gardens, but more probably to height differences that are common across all environments (i.e. populations from Atlantic climates are generally taller). This is confirmed by the models below in which the population-specific BLUPs for height across all common gardens was included as a confounder. When included, the association between genomic offset predictions and tree height in Asturias almost disappears and either disappears or goes in the opposite direction as expected in Bordeaux. Except the genomic offset predictions based on the RDA remain negatively associated with tree height in Bordeaux and Asturias.
Therefore, I am not sure we can go very far in this validation step. We may retain that genomic offset predictions seem to show the most consistent association with tree height when generated with the RDA, and that this association is robust (though smaller) even when the fixed genetic differences across populations are included as confounder.
4.2 Model 2
4.2.1 Model equation and code
Model 2 = Model 1 + Initial height as a confounder.
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real classic_R2; // classic R2 to evaluate the goodness of fit of the model
mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
classic_R2 = 1 - variance(Y - mu) / variance(Y); // classical R2
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
beta_H ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
generated quantities{
// Bayesian R2
real bayes_R2_res; // Residual-based based R2 (uses draws from the residual distribution)
real bayes_R2_mod; // Model-based R2 (uses draws from the modeled residual variances)
bayes_R2_res = variance(mu) / (variance(mu) + variance(Y-mu));
bayes_R2_mod = variance(mu) / (variance(mu) + square(sigma));
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
}
coefftab <-lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method), function(method_i){# Subset the data df_sub <- df %>%filter(method == method_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the models mod %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2_",site_i,"_",method_i,".rds")))# Extract coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M2.rds"))
4.2.2 Proportion of variance explained
The proportion of variance explained by the height models was estimated with the classical \(\mathcal{R}^{2}\), the residual-based Bayesian \(\mathcal{R}^{2}\) and the model-based Bayesian \(\mathcal{R}^{2}\).
The classical \(\mathcal{R}^{2}\) is calculated as follows:
where \(Var_\mu\) is the variance of the modelled predictive means and \(Var_{res}\) is the residual variance.
In the residual-based Bayesian \(\mathcal{R}^{2}\), \(Var_{res}\) comes from draws from the residual distribution: \(Var_{res} = Var(Y-\mu)\). In the model-based Bayesian \(\mathcal{R}^{2}\), \(Var_{res}\) comes from the posterior quantities of the fitted model such as \(Var_{res} = \sigma^2\). As stated in , this Bayesian version of \(\mathcal{R}^{2}\) can be considered as a data-based estimate of the proportion of variance explained for new data.
In the main manuscript, we report results from the model-based Bayesian \(\mathcal{R}^{2}\).
4.2.3 Running model without predictor
Code
stancode =stan_model(here("scripts/StanModels/ValidationCommonGarden_HeightModel_M2_WithoutPredictor.stan"))print(stancode)params_to_estimate <-c("beta_H","sigma","alpha_bloc","classic_R2","bayes_R2_mod","bayes_R2_res")coefftab <-lapply(unique(height_data$cg), function(site_i){# Subset the data df_sub <- df %>%filter(method =="CTD_bio1"& cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Run the models mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Save the models mod %>%saveRDS(file=here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2_",site_i,"_WithoutPredictor.rds")))# Extract coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method="WithoutPredictor",.before=1) }) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M2_WithoutPredictor.rds"))
4.2.4 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M2")
[[1]]
[[2]]
[[3]]
[[4]]
[[5]]
[[6]]
4.2.5 Vizualising predicted associations
4.2.5.1 Mean height predictions
We plot the mean predicted relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M2")
$asturias
$bordeaux
$caceres
$madrid
$portugal
4.2.5.2 Height predictions with uncertainty intervals
The graphs above do not show the estimate uncertainty. We generate some graphs to visualize the uncertainty around the estimates :
the uncertainty in the mean estimates (i.e. variability in posterior draws of the linear predictor \(\mu\))
the uncertainty in tree height predictions (i.e. after accounting for \(\sigma\) in the predictions).
Code
############################################################################## Visualizing the relationships between GO predictions and height predictions#############################################################################lapply(unique(height_data$cg), function(site_i){list(CTDs=names(methods) %>%str_subset("^CTD"),go_methods=names(methods) %>%str_subset("^CTD", negate=T)) %>%lapply(function(method_group_i){myplots <-lapply(method_group_i, function(method_i){# Subset the datadf_sub <- df %>%filter(method == method_i & cg == site_i)sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Data in a list for Stanstanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),bloc =as.numeric(as.factor(sub_data$block)))# Loading the modelmod <-readRDS(file =here(paste0("outputs/ValidationCommonGarden/HeightModels/stan_models/M2_",site_i,"_",method_i,".rds")))# we extract the posterior distributions of all parameterspost <-as.data.frame(mod)# Vector of predictor values (based on the min and max of the GO predictions)x_vec <-seq(min(stanlist$X),max(stanlist$X),0.05)# Predicting the linear predictor mu (predicting mean-centered height without sigma uncertainty)################################################################################################# we extract the posterior draws of mu, and its mean and HDPIs for each predictor value# Function to predict mean-centered height in block 2 and with the initial tree height fixed to the meanf_mu <-function(x) post$`alpha_bloc[2]`+ post$`beta_X1`* x + post$`beta_X2`* x^2+ post$`beta_H`*mean(sub_data$init_height) mu_pred <-sapply(x_vec, f_mu) %>% tibble::as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, y, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(y, credMass =0.90)[1],hpdi_h = HDInterval::hdi(y, credMass =0.90)[2]) %>%mutate(mu_mean =mean(y)) %>%ungroup() %>%mutate(x =as.numeric(x))# Keep mean and 90% HDPI of mu (one value for each iteration)mu_mean_df <- mu_pred %>% dplyr::select(x,mu_mean,hpdi_l,hpdi_h) %>% dplyr::distinct()# Predicting mean-centered height with sigma uncertainty)########################################################## we extract the posterior draws of height predictions, and its mean and PIs for each predictor valuey_pred <-sapply(x_vec,function(x)rnorm(NROW(post), post$`alpha_bloc[2]`+ post$`beta_X1`* x + post$`beta_X2`* x^2+ post$`beta_H`*mean(sub_data$init_height), post$sigma)) %>%as_tibble() %>%rename_all(function(x) x_vec) %>%mutate(Iter =row_number()) %>%gather(x, y, -Iter) %>%group_by(x) %>%mutate(hpdi_l = HDInterval::hdi(y, credMass =0.90)[1],hpdi_h = HDInterval::hdi(y, credMass =0.90)[2]# pi_l = rethinking::PI(y, prob = 0.90)[1],# pi_h = rethinking::PI(y, prob = 0.90)[2] ) %>%ungroup() %>%mutate(x =as.numeric(x)) %>% dplyr::select(x,hpdi_l,hpdi_h) %>% dplyr::distinct()# Plots######## Plot options y_limits <-c(-3,3)if(grepl("CTD",method_group_i[[1]]) == T){x_axis <-"Mean-centered CTD"} else { x_axis <-"Mean-centered GO predictions"}y_axis <-"Mean-centered tree height"p <-ggplot() +ylim(y_limits) +labs(y=y_axis, x=x_axis) +theme_bw()# First 100 posterior draws of the linear predictor mup1 <- p +geom_point(data = mu_pred %>%filter(Iter <101),aes(x, y), alpha = .2, color ='dodgerblue')# Mean (line) and 90% HPDI (shaded region) of the linear predictor mup2 <- p +geom_ribbon(data = y_pred,mapping =aes(x=x, ymin=hpdi_l, ymax=hpdi_h), alpha = .1, fill='dodgerblue') +geom_ribbon(data = mu_mean_df,aes(x = x, ymin = hpdi_l, ymax = hpdi_h),alpha = .2,fill='dodgerblue') +geom_line(data = mu_mean_df,aes(x=x, y=mu_mean), col="dodgerblue4")p1_p2 <-ggarrange(p1, p2 +labs(y=""), nrow =1)# Titletitle <-ggdraw() +draw_label(paste0(cg_names[[site_i]]," - ",methods[[method_i]]),fontface ='bold',x =0, hjust =0) +theme(plot.margin =margin(0, 0, 0, 7))# merge title and plotsp1_p2 <-plot_grid(title, p1_p2, ncol =1, rel_heights =c(0.1, 1))p2 <- p2 +labs(subtitle = methods[[method_i]])ggsave(p1_p2,file=here(paste0("figs/ValidationCommonGarden/HeightModels/HeightPredictions_MuPosteriorDraws_UncertaintyIntervals/", method_i,"_",site_i,"_M2.pdf")),device="pdf",height=6,width=11)return(p2)})if(grepl("CTD",method_group_i[[1]]) == T){ group_name <-"CTDs" p <-plot_grid(myplots[[1]] +labs(x=""), myplots[[2]] +labs(y="",x=""), myplots[[3]] +labs(x=""), myplots[[4]] +labs(y="",x=""), myplots[[5]], myplots[[6]] +labs(y=""),ncol=2) fig_width <-10} else { group_name <-"SNPsets" p <-plot_grid(myplots[[1]] +labs(x=""), myplots[[2]] +labs(y="",x=""), myplots[[3]] +labs(y="",x=""), myplots[[4]] +labs(x=""), myplots[[5]] +labs(y="",x=""), myplots[[6]] +labs(y="",x=""), myplots[[7]], myplots[[8]] +labs(y=""), myplots[[9]] +labs(y=""),ncol=3) fig_width <-12 }ggsave(p,file=here(paste0("figs/ValidationCommonGarden/HeightModels/HeightPredictions_", group_name,"_",site_i,"_M2.pdf")),device="pdf",height=12,width=fig_width)})})
4.3 Model 3
4.3.1 Model equation and code
Model 3 = Model 2 + Initial height as a confounder + Clone fixed intercepts
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
int<lower=0> nb_clon; // Number of clones
int<lower=0, upper=nb_clon> clon[N]; // Clones
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
vector[nb_clon] alpha_clon; // Intercepts of the clones
real beta_0; // Global intercept
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Linear coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
mu = beta_0 + alpha_bloc[bloc] + alpha_clon[clon] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
sigma ~ exponential(1);
alpha_bloc ~ std_normal();
alpha_clon ~ std_normal();
beta_0 ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
beta_H ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
coefftab <-lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method), function(method_i){ df_sub <- df %>%filter(method == method_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Datalist for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),nb_clon =length(unique(sub_data$clon)),bloc =as.numeric(as.factor(sub_data$block)),clon =as.numeric(as.factor(sub_data$clon)))# Running the model mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Model coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M3.rds"))
4.3.2 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M3")
[[1]]
[[2]]
[[3]]
[[4]]
4.3.3 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M3")
$asturias
$bordeaux
$caceres
$madrid
$portugal
4.4 Model 4
4.4.1 Model equation and code
Model 4 = Model 2 + Initial height as a confounder + Clone varying intercepts
\(Y_{ipb}\) is the height of the individual \(i\) in the population \(p\) and block \(b\).
\(\sigma^{2}\) is the residual variance of the model.
\(B_b\) are the block intercepts.
\(X_p\) is the GO or CTD of the population \(p\), with \(\beta_{X1}\) and \(\beta_{X2}\) being its linear and quadratic coefficients, respectively. The quadratic term was included to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
\(H_p\) is a proxy of the initial height of the trees from population \(p\), i.e. when trees were planted. For that, we used the population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. (2022).
\(\beta_0\) is the model global intercept.
\(C_c\) are the clone varying intercepts which follow a normal distribution of variance \(\sigma_C^2\).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] Y; // Response variable (individual height)
vector[N] X; // Genomic offset or climatic transfer distances
vector[N] H; // Initial height of the populations
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
int<lower=0> nb_clon; // Number of clones
int<lower=0, upper=nb_clon> clon[N]; // Clones
}
parameters {
vector[nb_bloc] alpha_bloc; // Intercepts of the blocks
real beta_0; // Global intercept
real beta_X1; // Linear coefficent of GO or CTD
real beta_X2; // Quadratic coefficent of GO or CTD
real beta_H; // Coefficient of the initial height of the populations
real<lower = 0> sigma; // Residual variance of the model
real<lower = 0> sigma_clon; // Variance of the clone intercepts
vector[nb_clon] z_clon; // Vector for non-centered parameterization
}
transformed parameters {
vector[N] mu; // Linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
vector[nb_clon] alpha_clon; // Intercepts of the clones
alpha_clon = z_clon*sigma_clon;
mu = beta_0 + alpha_bloc[bloc] + alpha_clon[clon] + beta_X1 * X + beta_X2 * square(X) + beta_H * H;
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
// Likelihood
Y ~ normal(mu, sigma);
// Priors
beta_0 ~ normal(mean(Y),2);
sigma ~ exponential(1);
sigma_clon ~ exponential(1);
alpha_bloc ~ std_normal();
z_clon ~ std_normal();
beta_H ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
// generated quantities{
// // log likelihood for loo
// vector[N] log_lik;
// vector[N] muhat;
// for (n in 1:N) {
// log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma);
// muhat[n] = normal_rng(mu[n], sigma);
// }
// }
coefftab <-lapply(unique(height_data$cg), function(site_i){lapply(unique(df$method), function(method_i){ df_sub <- df %>%filter(method == method_i & cg == site_i) sub_data <- height_data %>%filter(cg == site_i) %>%left_join(df_sub, by =c("cg","pop")) %>%left_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))) %>% dplyr::rename(init_height=height), by="pop")# Datalist for Stan stanlist <-list(N =nrow(sub_data),Y=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX),H=(sub_data$init_height -mean(sub_data$init_height))/sd(sub_data$init_height),nb_bloc =length(unique(sub_data$block)),nb_clon =length(unique(sub_data$clon)),bloc =as.numeric(as.factor(sub_data$block)),clon =as.numeric(as.factor(sub_data$clon)))# Running the model mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE,pars = params_to_estimate) # Model coefficients broom.mixed::tidyMCMC(mod,pars=params_to_estimate,droppars =NULL, robust =FALSE, # give mean and standard deviationess = F, rhat = F, conf.int = T,conf.level =0.95) %>% dplyr::rename(mean=estimate,std_deviation=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/HeightModels/coefftab_M4.rds"))
4.4.2 Model coefficients
We plot the mean and 95% credible intervals of the \(\beta_{X_1}\), \(\beta_{X_2}\) and \(\beta_H\) coefficients. Graph titles include the time in months corresponding to the age at which height and survival were recorded. Coefficients in the green area have the expected sign, reflecting lower height in populations with higher GO predictions.
Code
generate_interval_plots(model_i ="M4")
[[1]]
[[2]]
[[3]]
[[4]]
4.4.3 Predicted quadratic relationships
We plot the predicted quadratic relationships between tree height and GO predictions / CTD.
Code
generate_poly_plots(model_i ="M4")
$asturias
$bordeaux
$caceres
$madrid
$portugal
4.5 Sample size for each clone
We look at the number of height measurements for each clone. In the tables below (one for each common garden), the first column correspond to the number of height measurements and the second column correspond to the number of clones with this number of measurements.
Archambeau, Juliette, Marta Benito Garzón, Frédéric Barraquand, Marina de Miguel, Christophe Plomion, and Santiago C González-Martı́nez. 2022. “Combining Climatic and Genomic Data Improves Range-Wide Tree Height Growth Prediction in a Forest Tree.”The American Naturalist 200 (4): E141–59.
Fitzpatrick, Matthew C, Vikram E Chhatre, Raju Y Soolanayakanahally, and Stephen R Keller. 2021. “Experimental Support for Genomic Prediction of Climate Maladaptation Using the Machine Learning Approach Gradient Forests.”Molecular Ecology Resources.